Welcome to the questionnaire design guide!
An aim of this course is to develop your ability to translate business problems into actionable research questions and to design an adequate research plan to answer these questions. Therefore, you need to be equiped with knowledge on how to create a survey and properly conduct a research.
Generally, what you can expect from the survey design is similar to what one experiences in a relationship. If you try to take more than you commit, it doesn’t work out. Now on a serious note, if you follow guidelines mentioned here, you will certainly avoid usual traps your fellow collegues were caught in.
In a research process, conducting a survey is a part of (primary) data collection. Before we collect data, we have to make sure that preceding steps are correctly done. However, in the following sections we will focus on the process of designing a questionnaire. Eventually, you will be able to collect relevant data and apply appropriate statistical tests.
A structured questionnaire is a research instrument designed to elicit specific information from a sample of a target population. Usually it is used in a standardized way with fixed-alternative questions (same questions and response options for all respondents).
An objective of a questionnaire is threefold:
In order to meet these objectives, a questionnaire design process suggests the following sequence of steps:
The questionnaire design should be aligned with the research design! In order to do make it aligned, it is necessary to review components of the problem and the approach. In particular, you should review the research questions, hypotheses and characteristics that influence the research design.
If you are interested in the causal effect of one particular (independent) variable on another (dependent) variable, think about an experimental design that might allow you to manipulate this variable. In this case, you particularly have to decide on the following:
What you need to be careful about is the effect of reversed causation. The effect refers to the situation where the causal relationship could possible have an opposite direction from what we assumed at the first place. For instance, it is often assumed that an increase in individual income leads to increase in well-being (happiness). However, some researches suggest that this causation could have an opposite direction, i.e. that actually increase in well-being of an individual leads to an increase in income.
Here are some examples of causal research design applications:
If you would like to analyze the effects of multiple categorical or continuous (independent) variables on one continuous (dependent) variable, you might use a regression model. When doing this, you particularly have to decide on:
How to measure the dependent variable (DV). This is particularly important, since you need a variable that is powerful in uncovering variation between subjects (e.g., open-ended questions, such as “How much are you willing to pay for this product” are good candidates). Moreover, you also need to consider the nature of your DV,i.e. whether it is an interval variable, ordinal or categorical variable. The nature of your DV will heavily influence your choice of a correct statistical test.
How to measure the independent variables (IV) (single-item vs. multi-item scales, categorical vs. continuous). Bear in mind that the nature of the IV, together with DV, affects your choice of a statistical test as well.
What other variables might cause the effect that you would like to investigate (to prevent omitted variable bias, i.e. variables that are not part of your model but still influence the dependent variable).
Potential interactions (e.g., is the effect of variable X stronger for group A vs. B?)
In the next step you should review the type of interviewing method you will use. At this point you need to think in which setting you aim to conduct your survery. For instance, should you do it in a face-to-face setting or rather online. Here you can find some advantages and disadvantages of online surveys:
Additionally, here is the list of the online tools you can use to conduct an online survey (usually for free):
In this step you are starting to work on the content of you questions. There are several questions you should ask yourself when writing questions:
In your survey try to avoid asking double-barrelled questions.Those are a single question that attempts to cover two issues. Such questions can be confusing to respondents and result in ambiguous responses. Instead, you might ask multiple questions in order to obtain the inteded information.
Incorrect:
Do you think Nike Town offers better variety and prices than other Nike stores?
Correct:
Do you think Nike Town offers better variety than other Nike stores?
Do you think Nike Town offers better prices than other Nike stores?
The quality of collected data you highly depends on your ability to address correct participants. Therefore, you need to make sure that your respondents are able to meaningfully answer your questions.
Examples:
If you are asking participants to recall certain brands for instance, make sure you use unaided recall question:
Example of unaided recall question:
What brands of soft drinks do you remember being advertised on TV last night?
Example of aided recall question:
Which of these brands were advertised last night on TV?
a) Coca-Cola
b) Pepsi
c) Red Bull
d) Evian
e) Don’t know
If you are asking participants to list something, the good case practice is to minimize the effort required by respondents:
Incorrect:
Please list all the departments from which you purchased merchandise on your most recent shopping trip to department store X.
Correct:
Please check all the departments from which you purchased merchandise on your most recent shopping trip to a department store:
a) Women’s dresses
b) Men’s apparel
c) Children’s apparel
d) Cosmetics
e) Jewelry
f) Other (please specify) ___________
In a case you are asking for information that could be considered sensitive (e.g. money, family life, political beliefs, religion), they should come at the end of the questionnaire. Moreover, it is recommendable to provide response categories rather than asking for specific figures:
Incorrect:
What is your household’s exact annual income?
Correct:
Which one of the following categories best describes your household’s annual gross income?
a) under 25.001 €
b) 25.001€ to 50.000 €
c) 50.001€ to 75.000 €
d) 75.001€ to 100.000 €
e) over 100.000 €
Every statistical analysis requires that variables have a specific levels of measurement. Measurement scales you choose for your questions in a survey will affect the answers you get and eventually statistical test you can apply. For instance, it would not make sense to compute an average of genders. An average of a categorical variable does not make much sense. Moreover, if you tried to compute the average of genders defined in numeric values (e.g. male=0, female=1), the output would be interpretable.
Therefore, it is crucial to become familiar with possibilities of each scale before you choose to add another question to your survey. Consequently, chances to obtain data you did not intend to collect and chances that you will not be able to apply tests you intended are significantly lower.
In the following table you can get a quick overview of possibilities per each measurement scale. :
In the table below you can find general procedure for choosing a correct analysis based on the measurement scale of your data and number of variables. It shows statistical analyses we covered during the course and aims to help you choose among them based on the nature of dependent variables on the side, and the nature and the number of your independent variables on the other side:
When it comes to scaling techniques, they are meant to study the relationship between objects. The basic scaling techniques classification is on comparative and non-comparative scales.
The noncomparative scale each object is scaled independently of the other objects. The resulting data is supposed to be measured in an interval and ratio scaled.
Comparative scales (or nonmetric scaling) compare direclty the stimulus object. For example, the respondent might be asked directly about his preference between domestic and foreign beer brands. As a result, the comparative data collected can only be interpreted in relative terms. In the following sections we will walk through both types of comparative scales and briefly introduce them.
In the table below you can find a couple of commonly measured constructs in marketing research such as attitude, importance, purchase intention and similar.
Typically, participants rate objects on a number of itemized, seven-point rating scales bounded at each end by one of two bipolar adjectives.
Semantic differential can measure respondent attitudes towards something (products,concepts, items, people…).
It helps you find the repondent’s position is on a scale between two bipolar adjectives such as “Sweet-Sour” or “Bright-Dark”. In comparison to Likert scale, which uses generic scales (e.g. extremely dissatisfied to extremely satisfied), semantic differential questions are posed within the context of evaluating attitudes.
Widely used rating scale in marketing research due to its versatility
When creating a semantical difference question, you should consider the following:
The sequence of questions in a questionnaire could play important role. For instance, more sensitive questions (such as demographic-related questions) are usually placed at the end as they can trigger change in respondent’s behavior.
If you plan to conduct an online survey, then you need to think about the respondent’s experience while doing your questionnaire. For instance, spread the content over more short pages and do not have fewer long pages. In online surveys, two questions on one page is a useful rule of thumb. Generally, respondents are reluctant to read and fill out long questionnaire pages. Hence, long pages will lead to a higher dropout rate. In order to reduce dropout rate state how long the survey will approximately take in the introduction of the questionnaire. Take into account that tools like Qualtrics provide the estimated response time in the survey overview.
Consider that the most of people usually use their phones to fill it out. Think about how the questionnaire will appear on a phone screen too. In that regard, think of length of questions especially.
In the end, the questionnaire structure has to be aligned with the research design. For example, if your research design features an experiment, this needs to be reflected in the questionnaire (e.g., you need to assign the respondents randomly to the experimental conditions in case of a between-subjects comparison).
In a between-subject design you randomly assign each respondent to different experimental conditions. They would then complete tasks only in the condition to which they are assigned.
For instance, we would like to test the effect of two advertisements on purchase intention. Therefore, one group of (randomly assigned) respondents will be exposed to one advertisement version while the other group (of randomly assigned respondents) will be exposed to another version. After that, both groups of respondents should express their willingness to buy the advertised product. Evenutally, if the dependent variable (e.g. willingness to buy) is measured on interval or ratio scale, then you can use independent t-test to compare group means. The whole experimental design should be organised as following:
This type of experimental design involves exposing each respondent to all of the user experimental conditions you’re testing. This way, each respondent will test all of the conditions.
For instance, we would like to test again the effect of two advertisements on purchase intentions, but this time in a within-subject design. First, each respondent will be exposed to the first version of advertisement and right after that asked to rate his/her willingness to buy the advertised product. Subsequently, each participant will be shown another version of advertisement and again rate his/her willingness to purchase the advertised product. Finally, we can compare group means with paired sample t-test (given that data is measured on interval or ratio scale).
Generally, question wording should enable each respondent to understand questions and to be able to answer them with reliability. Reliability means that, if a respondent was asked the same question again, he/she would give the same answer again. A number of common problems regarding the question wording have been identified, so we will address the most important ones.
In order to ensure reliability, the issue in terms of who, what, when and where should be defined in each question.
Example: Which brand of shampoo do you use?
Who (the respondent): It is not clear whether this question relates to the individual respondent or the respondent’s total household.
What (the brand of shampoo): It is unclear how the respondent is to answer this question if more than one brand is used.
When (unclear): The time frame is not specified in this question. The respondent could interpret it as meaning the shampoo used this morning, this week, or over the past year.
Where (not specified): At home, at the gym? Where?
A more clearly defined question is:
Which brand or brands of shampoo have you personally used at home during the last month? In the case of more than one brand, please list all the brands that apply.
Use ordinary words. Words should match the vocabulary level of the participants.
Incorrect:
“Do you think the distribution of soft drinks is adequate?”
Correct:
“Do you think soft drinks are easily available when you want to buy them?”
Avoid double negative form. Double negative question forms can confuse respondents, especially when they need to answer with “Agree” or “Disagree”.
Incorrect:
Do you think that it is not uncommon that boys play basketball?
Correct:
In your opinion, is it common that boys play basketball?
Avoid leading questions.Leading questions clue the participant to what the answer should be. Such questions introduce a bias in a particular direction.
Incorrect:
“Is Colgate your favorite toothpaste?”
Correct:
“What is your favorite brand of toothpaste?”
Avoid ambiguous words. Words such as usually, normally, frequently, often, regularly, and other similar words, do not define frequency clearly enough.
Incorrect:
“In a typically month, how often do you go to a movie theater to see a movie?”
a) Never
b) Occasionally
c) Sometimes
d) Often
e) Regularly
Correct:
“In a typically month, how often do you go to a movie theater to see a movie?”
a) Less than once
b) 1 or 2 times
c) 3 or 4 times
d) More than 4 times
One of the last steps in a process of designing a questionnaire is choosing adequate order of questions and instructions for respondents.
At the begining, you should provide a short and easy-to-understand introduction to the topic. Use simple language and avoid technical terms (e.g., not many people will know the terms “manufacturer brand” and “store brand”). Additionally, in the introduction you should state how long the survey will approximately take.
The opening questions should be interesting, simple and non-threatening. They are crucial because it is the respondent’s first exposure to the questionnaire and is likely to set the tone for the rest of questions in the questionnaire. If too difficult to understand, or sensitive in some way, respondents are likely to stop answering your questions. Qualifying questions (or screening questions) should serve as the opening questions (if applicable). Their purpose is to identify a potential respondent that is eligable to proceed with the research survey.
After the opening part, you should establish an optimal question flow. General questions should precede the specific questions. Questions on one subject, or one particular aspect of a subject, should be grouped together. It may feel confusing to be asked to return to some subject they thought they already gave their opinions about.
As respondents are moving towards the end of the questionnaire, they are likely to become increasingly indifferent and might give careless answers. Therefore, questions of special importance should ideally be included in the earlier part of the questionnaire.
Finally, you should pay particular attention to provide all prescribed definitions and explanations before you ask a question. This ensures that the questions are undestood in consistent way by every respondent.
Finally, before you distribute the final questionnaire, there are some things to consider. First, you should always pretest your questionnaire before sharing it! Test all aspects of the questionnaire (content, wording, sequence, form & layout, etc.). If possible, use respondents in the pretest that are similar to those who will be included in the actual survey. Ideally, the pretest sample size should be small (in a real scenario this could varyfrom 15 to 30 respondents; for the group project, a lower number will be sufficient). After each significant revision of the questionnaire, conduct another pretest, using a different sample of respondents. Eventually, code and analyze the responses obtained from the pretest so that you make sure that you collected information you intended to collect.
After testing your questionnaire you should be able to determine whether:
A questionnaire creation in Qualtrics starts with creation of a Qulatrics project. Each project consists of a survey, distribution record, and collection of responses and reports. There are three ways to create a questionnaire.First, you can create a new survey project from scratch. Second, you can create a new questionnaire from a copy of an existing questionnaire. Eventually, you can create from a template in your Survey Library, or from an exported QSF file.
In order to create a completely new questionnaire, you need to do the following:
Go to the Projects page by clicking the Qualtric XM logo or clicking Projects on the top-right.
Create new project by clicking the blue button on the right side.
In the “Create your own” section click on the survey button.
Enter a name for your survey and get started with a survey creation.
If you would like to create a new questionnaire on a basis of an already existing one, then you choose “From a Copy”. Subseqeuntly, you need to indicate the questionnaire you would like to copy. Now you are good to go!
If there is a questionnaire in the Qualtrics Library you would like to use, then you need to choose “From Library”, and indicate one library name in the dropdown menu.
Attache Paket: 㤼㸱janitor㤼㸲
The following objects are masked from 㤼㸱package:stats㤼㸲:
chisq.test, fisher.test
Parsed with column specification: cols( .default = col_double(), StartDate = col_character(), EndDate = col_character(), IPAddress = col_logical(), RecordedDate = col_character(), ResponseId = col_character(), RecipientLastName = col_logical(), RecipientFirstName = col_logical(), RecipientEmail = col_logical(), ExternalReference = col_logical(), LocationLatitude = col_character(), LocationLongitude = col_character(), DistributionChannel = col_character(), UserLanguage = col_character(), Q7_MC_sa_country_3_TEXT = col_logical(), Q23_Gender_3_TEXT = col_logical(), Condition = col_character() ) See spec(…) for full column specifications.
In this chapter we will encounter the nature of data you collect when conducting a survey. It will help you choose a type of a question depending on the nature of data you want to collect and on the type of statistical tests you want to apply.
Multiple Choice with a single answer is a type of closed-ended question that lets respondents select one answer from a defined list of choices.
Type of data you obtain is categorical, and the output comes in the following form:
| In a typical week, how many hours do you spend watching movies or TV series on Netflix? |
|---|
| 3 |
| 4 |
| 5 |
| 4 |
| 5 |
| 2 |
What to do with this data now? First, we need to load it in R and prepare for analysis. The numbers you see in the output R recognizes as numeric. In order to conduct statistical modeling and properly visualize our results, we need to convert our data to a factor class.
A factor (or coding variable) represents different groups of data by using numbers (integers). In fact, factors appear as numeric variables, but they hold meaning of labels/names of data groups, i.e. nominal variable. These data groups are represented in a form of ‘levels’.
In our case, our multiple choice question output will contain 4 data groups (‘Grocery Store’, ‘Online shop’, ‘Specialised coffee shop’, ‘other’) after converting it to factor:
# Convert numeric value to factors
qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?' <- factor(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?', levels = c(1:5), labels = c('Never','1-2 hours','3-4 hours','5-6 hours','more than 6 hours'))
qualtrics$` Selected Choice_1` <- factor(qualtrics$` Selected Choice_1`,levels = c(1:2),labels = c("Male","Female"))
qualtrics$` Selected Choice` <- factor(qualtrics$` Selected Choice`, levels = c(1:2), labels=c("Austria","Germany"))
# Table
table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
Never 1-2 hours 3-4 hours 5-6 hours more than 6 hours
19 18 22 35 23
table(qualtrics$` Selected Choice`) #countries
Austria Germany
35 82
table(qualtrics$` Selected Choice_1`) #gender
Male Female
49 68
Second, you might want to visualize your results. In order to do so, the data format needs to be in the appropriate format.Here we proceed with data fromat adaptation from the point where we stopped:
# Converting long format to the visualisation-friendly format
mlc_visualisation <- as.data.frame(table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))
# Naming columns
names(mlc_visualisation) <- c('Time','Count')
# Observing
knitr::kable(mlc_visualisation)
| Time | Count |
|---|---|
| Never | 19 |
| 1-2 hours | 18 |
| 3-4 hours | 22 |
| 5-6 hours | 35 |
| more than 6 hours | 23 |
NA
The simplest way to visualize data obtained from multiple choice question with a single answer is a bar chart:
## Basic bar chart
labels <- as.character(mlc_visualisation$Time) #Save labels for x-axis in the barplot
barplot(mlc_visualisation$Count, # Column to visualize
xlab='Time', # X-axis label
ylab = 'Count(answers)', # Y-axis label
names.arg = labels,
main = 'How many hours do you spend watching movies or series on Netflix?') # Title
R package ggplot2 allows you to create visually appealing graphs:
## ggplot2 bar chart
library(ggplot2)
p <- ggplot(data=mlc_visualisation,
aes(x=Time, y=Count, fill=Time)) +
geom_bar(stat='identity') + theme_minimal() + labs(title = "In a typical week, how many hours do you spend watching movies or series on Netflix?")
p
Another R library which can help you make amazing interactive charts in a minute is plotly. Here we use a function called ggplotly(), which allows you to turn any ggplot2 chart interactive. Since we have already created a bar chart using ggplot2 and saved it as “p”, we will just turn it into plotly graph:
## ggplotly bar chart
library(plotly)
ggplotly(p)
An improved version of ggplot2 package is the packaged called ggvis, which is still in developing:
## ggvis bar chart
library(ggvis)
ggvis(mlc_visualisation,
x = ~Time,
y = ~Count,
fill=~Time)
Data type collected from the previous question is ordinal as we are able to make a natural order of the levels. Since it is ordinal data type, it belongs to categorical data. For the analysis of categorical data we can use Chi-square test or Fisher’s test if a count for some level is less than 5.
Example: We would like to know whether a number of hours spent watching Netflix depends on the respondents’ country of origin.
# Creation of contingency table
fisher_test_table <-table(qualtrics$` Selected Choice`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
# Check how our contigency table looks like
fisher_test_table
Never 1-2 hours 3-4 hours 5-6 hours more than 6 hours
Austria 3 7 6 11 8
Germany 16 11 16 24 15
# Since we have a count less than 5, we should apply Fisher's test instead of Chi-square.
# Fisher's test
test <- fisher.test(fisher_test_table)
test
Fisher's Exact Test for Count Data
data: fisher_test_table
p-value = 0.575
alternative hypothesis: two.sided
# p-value
test$p.value
[1] 0.5750401
From the output and from test$p.value we see that the p-value is higher than the significance level of 5%. Like any other statistical test, if the p-value is higher than the significance level, we can not reject the null hypothesis.
In our case, not rejecting the null hypothesis for the Fisher’s exact test of independence means that there is no significant relationship between the two categorical variables. Therefore, knowing the value of one variable does not help to predict the value of the other variable.
# Creating table
(mlc_chi_square <- table(qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?'))
Never 1-2 hours 3-4 hours 5-6 hours more than 6 hours
19 18 22 35 23
# Chi-square test (without given expected values = equal values )
chisq.test(mlc_chi_square)
Chi-squared test for given probabilities
data: mlc_chi_square
X-squared = 7.9145, df = 4, p-value = 0.09476
The p-value of the test is higher than 0.05. We can conclude that the numbers of respondents who spent different amount of hours watching Netflix are commonly distributed. Observed distribution does not differ significantly from the expected. This result does not surprise if you take a look at the values for each level in the table we created before conducting the test. There you can see that count of answers in each level is more or less not deviating too much. It is visible if you take a look at the previous visualisations as well.
If we are interested in testing more specific distribution, i.e. expect that 40% of our respondents are watching Netflix 3-4 hours, we can introduce corresponding distribution in the test.
# Expected values in percentages for each alternative. The sum must be 1.
expected_values <- c(0.10, # We expect that 10% of our respondents do not watch Netflix at all ("Never").
0.20, # We expect that 20% of our respondents watch Netflix 1-2 hours a week.
0.40, # We expect that 40% of our respondents watch Netflix 3-4 hours a week.
0.20, # We expect that 20% of our respondents watch Netflix 5-6 hours a week.
0.10 # We expect that 10% of our respondents watch Netflix more than 6 hours a week.
)
# Chi-square test with expected values
chisq.test(mlc_chi_square, p=expected_values)
Chi-squared test for given probabilities
data: mlc_chi_square
X-squared = 35.607, df = 4, p-value = 3.486e-07
This time the p-value of the test is lower than 0.05.We have an evidence that observed distribution does significantly differ from the expected distribution (10%/20%/40%/20%/10%).
# Creation of contingency table
chi_square_table <-table(qualtrics$` Selected Choice_1`,qualtrics$'In a typical week, how many hours do you spend watching movies or TV series on Netflix?')
# Chi-square independence test
chisq.test(chi_square_table)
Pearson's Chi-squared test
data: chi_square_table
X-squared = 1.5739, df = 4, p-value = 0.8135
Since the p-value (0.8135) is higher than the significance level (0.05), we cannot reject the null hypothesis. Thus, we conclude that there is no association relationship between gender and number of hours spent watching Netflix. Therefore, we can say that the hours spent is independent from the gender of participant.
Before we conduct any test, we will do some simple calculatios and visualise our data.
# Rename columns
colnames(qualtrics)[38] <- "ja!Naturlich"
colnames(qualtrics)[39] <- "Clever"
colnames(qualtrics)[40] <- "Spar Vital"
colnames(qualtrics)[41] <- "..."
# Replacing NA with 0
qualtrics$`ja!Naturlich`[is.na(qualtrics$`ja!Naturlich`)]=0
qualtrics$Clever[is.na(qualtrics$Clever)]=0
qualtrics$`Spar Vital`[is.na(qualtrics$`Spar Vital`)]=0
qualtrics$...[is.na(qualtrics$...)]=0
# Calculating frequency, percentage of respondents and percentage of cases
df.cochran <- data.frame(Frequnecy = colSums(qualtrics[38:41]),
Share_of_respondents = (colSums(qualtrics[38:41])/sum(qualtrics[38:41]))*100,
Share_of_cases =((colSums(qualtrics[38:41]))/nrow(qualtrics[38:41]))*100)
# Observing
df.cochran
# Visualisation
barplot(df.cochran[,3], names.arg = row.names(df.cochran), main = "% of Respondents familiar with brands", xlab = "Brand",ylab = "Percentage")
The visualisation above depicts the fact that more than 60% percent of people are familiar with the brand “ja!Naturlich”, while we can not say the same for other brands considered in our question.
For the analysis of results collected with multiple choice question with multiple possible answers, we can use Cochran’s Q test.Although we did not mention it before, it is not too different from what you have already learned about other tests.
The Cochran’s Q test and associated multiple comparisons require the following assumptions: 1. Responses are dichotomous and from k number of matched samples. 2. The subjects are independent of one another and were selected at random from a larger population. 3. The sample size is sufficiently “large”. (As a rule of thumb, the number of subjects for which the responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24)
In a within-subjects experiment design with three or more observations of a dichotomous(= just two levels such as “Yes” or “No”) categorical outcome, you utilize Cochran’s Q test to assess main effects.Similarly, in our multiple choice question with multiple answers we have the same respondent going through three or more potential answers with dichotomous(=yes or no) categorical outcome.
library(nonpar)
# Creation of matrix
#matrix.cochran <- cbind(qualtrics$`ja!Naturlich`,
# qualtrics$Clever,
# qualtrics$`Spar Vital`,
# qualtrics$`...`)
# Turning NAs to 0
#matrix.cochran[is.na(matrix.cochran)]=0
# Cochran test
#cochrans.q(matrix.cochran, alpha = 0.05)
The p-value less than 0.05 indicates that there is enough evidence to conclude that some of the store brands are better known among our respondents than other. In order to take a closer look at it, we need to conduct a post hoc test.
library(DescTools)
list.cochran <- list(qualtrics$`ja!Naturlich`,
qualtrics$Clever,
qualtrics$`Spar Vital`,
qualtrics$...) # imaginary brand
# Replacing NAs in the list with 0 in order to be able to run the test
list.cochran <- rapply(list.cochran, f=function(x) ifelse(is.na(x),0,x), how="replace" )
# Post hoc test (Dunn Test)
DunnTest(list.cochran, method="bonferroni")
Dunn's test of multiple comparisons using rank sums : bonferroni
mean.rank.diff pval
2-1 -36 0.1093
3-1 -18 1.0000
4-1 -74 7.3e-06 ***
3-2 18 1.0000
4-2 -38 0.0761 .
4-3 -56 0.0014 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the results of the Dunn Test, we can see that there is a big difference between 1 (“ja!Natürlich”) and 4(“…”), as well as between 4(“…”) and 3(“Spar Vital”).
A rank order question asks respondents to compare items to each other by placing them in order of preference. Note that the data obtained from a rank order question shows an order of a respondent’s pereference, but not the difference between items. For instance, if the the most important feature of a fitness tracker for a respondendt XY is “Measuring steps” and the second most important feature “Calories burned”, we don’t know for how much more important is the former one in comparison to the latter one.
Intuitive question to ask is the following: which feature of the fitness tracker is the most important for our respondents?
We can answer this question by calculating a mean rank for each feature. Before we do so, we will create a separate data frame and add columns of the response data.
rank.data <- data.frame(qualtrics$` Measuring steps`,
qualtrics$` Calories burned`,
qualtrics$` Measuring heartbeat`,
qualtrics$` Exercise tracking`,
qualtrics$` Measuring distance`)
colnames(rank.data)<-c("Measuring steps","Calories burned","Measuring heartbeat","Exercise tracking","Measuring distance")
First information we would like to know is how many preference combinations there are, and how repetitive they are. We can obtain that information by creating a summary of the ranking data frame we created.
library(pmr)
Lade n昼㸶tiges Paket: stats4
test <- rankagg(rank.data)
test
n
[1,] 2 1 3 4 5 10
[2,] 1 3 2 4 5 19
[3,] 2 3 1 4 5 17
[4,] 1 2 4 3 5 4
[5,] 4 2 1 3 5 3
[6,] 3 2 1 5 4 15
[7,] 1 3 5 2 4 10
[8,] 1 2 4 5 3 10
[9,] 2 4 1 5 3 9
[10,] 1 2 5 4 3 9
[11,] 5 4 3 1 2 3
[12,] 2 3 4 5 1 8
The matrix we received as an output is the summary of our ranking data. It shows that, for instance, the preference combination “2,1,3,4,5” repeats 10 times in the data frame. More specifically, it means that there are 10 respondents who prefer the item 2(“Calories burned”) the most, then the item 1(“Measuring steps”), and so on.
Now we can calculate the mean rank for each feature and conclude which feature is the most important to our respondents:
# Mean rank of each fitness tracker feature
destat(test)$mean.rank
Descriptive statistics of ranking data:
$mean.rank: mean ranks; $pair: pairs; $mar: marginals
[1] 1.811966 2.581197 2.598291 4.051282 3.957265
As we can observe from the output, the item 1(“Measuring steps”) shows the best mean rank among all items. Therefore, we can assume that the “Measuring steps” is most important for our respondents. However, in order to statistically prove it and become sure that this is not just by mere chance, we can conduct Friedman rank sum test.
Friedman rank sum test is used to identify whether there are any statistically significant differences between the distributions of 3 or more paired groups. It is used when the normality assumptions for using one-way repeated measures ANOVA are not met. Another case when Friedman rank rum test is used is when the dependent variable is measured on an ordinal scale, as in our case.
Before we conduct the Friedman rank sum test, we will visualise our data:
Attache Paket: 㤼㸱rstatix㤼㸲
The following object is masked from 㤼㸱package:janitor㤼㸲:
make_clean_names
The following object is masked from 㤼㸱package:stats㤼㸲:
filter
In case you would like cite this package, cite it as:
Patil, I. (2018). ggstatsplot: "ggplot2" Based Plots with Statistical Details. CRAN.
Retrieved from https://cran.r-project.org/web/packages/ggstatsplot/index.html
# We have just turned our data frame from the wide format to the long format by using function melt(). If we take a look at head and tail of our new data frame, we can see that it contains just two columns, "Rank" and "Feature".
rank.data.long <- reshape2::melt(rank.data,value.name = "Rank",variable.name = "Feature", stringsAsFactors=TRUE)
No id variables; using all as measure variables
attributes are not identical across measure variables; they will be dropped
tail(rank.data.long)
head(rank.data.long)
# Visualisation
ggstatsplot::ggwithinstats(
data = rank.data.long,
x = Feature,
y = Rank,
type = "np",
pairwise.comparisons = TRUE, # show pairwise comparison test results
title = "What features are important to you when evualting fitness trackers?")
Already from the advanced visualisation, that includes Friedman rank sum test and pairwise comparison, we can have an insight in significance of differences among features.
# Friedman test
friedman.test(as.matrix(rank.data))
Friedman rank sum test
data: as.matrix(rank.data)
Friedman chi-squared = 176.42, df = 4, p-value < 2.2e-16
Friedman rank sum test has a p-value lower than 0.05, so we can conclude that here are significant differences between at least two features (what we have already seen in our visualisation). Even though we have identified differences between preferences towards features in our advanced visualisation, we will conduct a post hoc test in order to demonstrate traditional way of calculating pairwise comparisons.
knitr::kable(wilcox_test(Rank ~ Feature, paired = TRUE, p.adjust.method = "bonferroni", data = rank.data.long))
| .y. | group1 | group2 | n1 | n2 | statistic | p | p.adj | p.adj.signif |
|---|---|---|---|---|---|---|---|---|
| Rank | Measuring steps | Calories burned | 117 | 117 | 1369.0 | 0.000000 | 0.000 | **** |
| Rank | Measuring steps | Measuring heartbeat | 117 | 117 | 2231.0 | 0.000753 | 0.008 | ** |
| Rank | Measuring steps | Exercise tracking | 117 | 117 | 354.0 | 0.000000 | 0.000 | **** |
| Rank | Measuring steps | Measuring distance | 117 | 117 | 367.5 | 0.000000 | 0.000 | **** |
| Rank | Calories burned | Measuring heartbeat | 117 | 117 | 3214.5 | 0.512000 | 1.000 | ns |
| Rank | Calories burned | Exercise tracking | 117 | 117 | 610.5 | 0.000000 | 0.000 | **** |
| Rank | Calories burned | Measuring distance | 117 | 117 | 940.0 | 0.000000 | 0.000 | **** |
| Rank | Measuring heartbeat | Exercise tracking | 117 | 117 | 1235.0 | 0.000000 | 0.000 | **** |
| Rank | Measuring heartbeat | Measuring distance | 117 | 117 | 1307.5 | 0.000000 | 0.000 | **** |
| Rank | Exercise tracking | Measuring distance | 117 | 117 | 3534.5 | 0.816000 | 1.000 | ns |
The output table provides us with p-values referring to significance of difference in mean ranks of each pair. For instance, the first 4 rows proves that the differences between the mean rank of the feature “Measuring steps” and each of the rest of features are significant. Consequently, we can conclude that this feature is by far the most important among our respondents.
Another question that may be interesting to explore is whether there are any complementary features ? Or features which overlap each other in its functionality? In order to have a look at that, we can investigate the correlation between ranks assigned to each feature.
#Correlation Matrix
cor.matrix<-cor(rank.data, method=c('spearman'))
cor.matrix
Measuring steps Calories burned Measuring heartbeat Exercise tracking Measuring distance
Measuring steps 1.00000000 -0.04651331 -0.6569094 0.2963322 -0.05958032
Calories burned -0.04651331 1.00000000 -0.2221626 -0.1083876 -0.11694481
Measuring heartbeat -0.65690943 -0.22216264 1.0000000 -0.3255840 -0.38178948
Exercise tracking 0.29633223 -0.10838758 -0.3255840 1.0000000 -0.47176821
Measuring distance -0.05958032 -0.11694481 -0.3817895 -0.4717682 1.00000000
At the first glance we can observe a lot of negative values, meaning that many features correlate negatively relative to each other. In order to make the interpretation easier, we will try to visualise correlations in a form of a correlation matrix.
library(ggcorrplot)
Attache Paket: 㤼㸱ggcorrplot㤼㸲
The following object is masked from 㤼㸱package:rstatix㤼㸲:
cor_pmat
ggcorrplot(cor.matrix)
From the correlation matrix we can confirm that almost all features negatively correlate to each other. An exception is the relationship between feature “Measuring steps” and “Exercise tracking”, which correlates positively. This matrix can be useful for digging deeper in relationship between preferences for features. For instance, we can assume that feature “Measuring steps” and “Exercise tracking” correlate positively because users see them as complementary features. Moreover, if we say that walking is a type of exercise (in case of longer walking routes), we can assume that users, who ranked “Exercise tracking” high, ranked “Measuring steps” high as well, because they perceive it as another type of “Exercise tracking”.
If you wish to obtain information about how much one attribute is preferred over another one, you may use a constant sum scale. The total box should always be displayed at the bottom to make it easier for respondents.A constant sum question permits collection of ratio data type. With data obtained we would be able to express the relative importance of the options.
| Location | Price | Ambience | Customer Service | id |
|---|---|---|---|---|
| 32 | 23 | 32 | 13 | 1 |
| 25 | 30 | 22 | 23 | 2 |
| 19 | 21 | 30 | 30 | 3 |
| 20 | 20 | 20 | 40 | 4 |
| 30 | 30 | 10 | 30 | 5 |
| 0 | 20 | 20 | 60 | 6 |
# Compute descriptive statistics
library(pastecs)
res <- stat.desc(constant.sum)
round(res[,1:4],2)
# Creation of the long version of data frame
constant.sum.long <-melt(constant.sum[,-5], variable.name ="Factor" ,value.name = "Points")
constant.sum.long
# Boxplot ggplot2
p<-constant.sum.long %>%
filter(Factor!="id") %>%
ggplot(aes(x=Factor, y=Points, fill= Factor)) +
geom_boxplot() +
theme_minimal() +
ggtitle("What factors do you consider when choosing a place to go for a dinner?") +
xlab("")
ggplotly(p)
With the data collected we are able to answer the question: what factor is the most important for our respondents when they go out for a dinner?
library(robCompositions)
Lade n昼㸶tiges Paket: pls
Attache Paket: 㤼㸱pls㤼㸲
The following object is masked from 㤼㸱package:stats㤼㸲:
loadings
Lade n昼㸶tiges Paket: data.table
data.table 1.13.0 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
Attache Paket: 㤼㸱data.table㤼㸲
The following objects are masked from 㤼㸱package:pastecs㤼㸲:
first, last
The following objects are masked from 㤼㸱package:reshape2㤼㸲:
dcast, melt
The following object is masked from 㤼㸱package:DescTools㤼㸲:
%like%
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
sROC 0.1-2 loaded
constSum(constant.sum,100)[,-5]
In order to anwser this question we need to conduct a repeated measures ANOVA. This type of ANOVA is used for analyzing data where the same subjects are measured more than once. In our case we have every respondent measured on each of the factors (locations, price, ambience and customer service). Repeated measures ANOVA is an extension of the paired-samples t-test. This test is also referred to as a within-subjects ANOVA. In the within-subject experimental design the same individuals are measured on the same outcome variable under different time points or conditions.
We need to check all assumptions that need to be fulfilled in order to deploy this type of ANOVA. There are three assumputions that need to check. The first to check that each level of the independent variable is approximately normally distributed. Since we have more than 30 observations at each level, we do not need to proceed further due to the central limit theorem. Second assumption referrs to extreme outliers. Let’s have a look at potential outliers:
# Outliers
constant.sum.long %>%
group_by(Factor) %>%
identify_outliers(Points)
As we cannot identify any extreme outliers, we can proceed with deploying repeated measures ANOVA.
# Formatting data
constant.sum.aov <- gather(constant.sum, key = "Factor", value = "Points", ` Location`,` Price`,` Ambience`,` Customer Service`)
attributes are not identical across measure variables;
they will be dropped
# One-way repeated measures ANOVA
res.aov <- anova_test(data = constant.sum.aov, dv = Points,wid = id ,within = Factor)
get_anova_table(res.aov)
ANOVA Table (type III tests)
Effect DFn DFd F p p<.05 ges
1 Factor 2.56 297.36 33.668 1.06e-16 * 0.225
# Post hoc test
pairwise.t.test(constant.sum.long$Points,constant.sum.long$Factor, paired = T, p.adjust.method = "holm")
Pairwise comparisons using paired t tests
data: constant.sum.long$Points and constant.sum.long$Factor
Location Price Ambience
Price 2.7e-15 - -
Ambience 3.2e-10 0.030 -
Customer Service < 2e-16 0.742 0.079
P value adjustment method: holm
Now we can clearly see that our respondents consider price more than location, or ambience, while customer service is perceived almost equally important as prices.
ggstatsplot::ggwithinstats(
data = constant.sum.long %>% filter(Factor!="id"), # excluding "id" column from the data
x = Factor,
y = Points,
type = "p",
pairwise.comparisons = TRUE, # show pairwise comparison test results
title = "What factors do you consider when choosing a place to go for a dinner?")
A text or number entry question is a recommended type of question if you are interested in obtaining ratio data type. We will use this type of question together with a constant sum question type to collect data that can be analysed with regression analysis. Note that in this case we treat constant sum data as ratio data and therefore assume that 0 means complete absence.
Here is a glimpse in answers on how important is each factor to our respondents when it comes to dinning outside:
| Location | Price | Ambience | Customer Service |
|---|---|---|---|
| 32 | 23 | 32 | 1 |
| 25 | 30 | 22 | 43 |
| 19 | 21 | 30 | 34 |
| 20 | 20 | 20 | 46 |
| 30 | 30 | 10 | 17 |
| 0 | 20 | 20 | 4 |
Additionally, we asked our respondents how much are they willing to spend on dinner on average. In order to handle data easier, we will create a new data frame where we merge all the data together:
dinner <- subset(qualtrics, select = c(" Location"," Price"," Ambience"," Customer Service", " Willingness-to-pay (in EUR)"))
knitr::kable(head(dinner))
| Location | Price | Ambience | Customer Service | Willingness-to-pay (in EUR) |
|---|---|---|---|---|
| 32 | 23 | 32 | 1 | 29 |
| 25 | 30 | 22 | 43 | 77 |
| 19 | 21 | 30 | 34 | 52 |
| 20 | 20 | 20 | 46 | 31 |
| 30 | 30 | 10 | 17 | 22 |
| 0 | 20 | 20 | 4 | 35 |
Before we conduct a linear regression analysis, we need to take a look at correlation matrix:
correlation <-cor(dinner, method=c('pearson'))
correlation
Location Price Ambience Customer Service Willingness-to-pay (in EUR)
Location 1.0000000 -0.31732620 -0.36134355 -0.16688104 0.14145397
Price -0.3173262 1.00000000 -0.21962027 0.08894752 -0.07438388
Ambience -0.3613436 -0.21962027 1.00000000 -0.02405881 -0.32550607
Customer Service -0.1668810 0.08894752 -0.02405881 1.00000000 0.12125571
Willingness-to-pay (in EUR) 0.1414540 -0.07438388 -0.32550607 0.12125571 1.00000000
From our data we see, for instance, that some negative correlation between willingness to pay and importance of ambiance as well as some positive correlation between importance of customer service and willingness-to-pay. Let us observe descriptive statistics as well:
knitr::kable(psych::describe(dinner))
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Location | 1 | 117 | 12.14530 | 10.85823 | 10 | 11.25263 | 14.8260 | 0 | 40 | 40 | 0.3585257 | -0.8903393 | 1.003844 |
| Price | 2 | 117 | 31.48718 | 16.22079 | 30 | 29.83158 | 14.8260 | 0 | 100 | 100 | 1.5662904 | 4.1917874 | 1.499613 |
| Ambience | 3 | 117 | 25.76068 | 13.97822 | 20 | 25.09474 | 14.8260 | 0 | 60 | 60 | 0.3807401 | -0.3100357 | 1.292286 |
| Customer Service | 4 | 117 | 49.35897 | 29.47777 | 47 | 49.29474 | 40.0302 | 0 | 98 | 98 | 0.0342022 | -1.1897398 | 2.725221 |
| Willingness-to-pay (in EUR) | 5 | 117 | 32.99145 | 26.26801 | 30 | 30.28421 | 29.6520 | 0 | 110 | 110 | 0.8007002 | 0.0124325 | 2.428479 |
We see that difference between mean and median does not suggest (at the first sight) great effect of outliers. Let us now do linear regression analysis:
mlr.dinner <- lm(` Willingness-to-pay (in EUR)` ~ ` Location` + ` Price` + ` Ambience`+` Customer Service`, data = dinner)
summary(mlr.dinner)
Call:
lm(formula = ` Willingness-to-pay (in EUR)` ~ ` Location` + ` Price` +
` Ambience` + ` Customer Service`, data = dinner)
Residuals:
Min 1Q Median 3Q Max
-40.810 -18.205 -3.314 14.059 74.274
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 55.31553 11.57393 4.779 5.38e-06 ***
` Location` -0.06739 0.25556 -0.264 0.792503
` Price` -0.28455 0.16117 -1.765 0.080205 .
` Ambience` -0.69755 0.19088 -3.654 0.000394 ***
` Customer Service` 0.10988 0.07931 1.386 0.168646
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 24.72 on 112 degrees of freedom
Multiple R-squared: 0.1449, Adjusted R-squared: 0.1144
F-statistic: 4.745 on 4 and 112 DF, p-value: 0.001421
Out of all factors of importance when dinning out, the only one that suggests significance at 0.05 level of significance is ambience. From the summary we can conclude that increase in importance of ambience by 1 point, leads to decrease in willingness to pay by -0.697554.
confint(mlr.dinner)
From confidence intervals, We can conclude that when we do not consider any of given factors (location, price, ambience and customer service), willingness to pay will be somewhere between 32.383272EUR and 78.2477971EUR. Besides that, for each increase in importance of dinner ambiance by one point, there will be an average decrease of willingness to pay between -1.0757599 and -0.3193481.
ggcoefstats(x = mlr.dinner,
title = "Willingness to pay predicted by importance of factors")
There are couple of things we need to consider when we do multiple linear regression. One of them are potential outliers in our data. Here we identify and visualize them:
# Outliers
outlier_values <- boxplot.stats(mlr.dinner$residuals)$out # outlier values.
outlier_values
We identified observations that belong to outlier values. We can even visualize them too:
boxplot(mlr.dinner$residuals, main="Willingnes to pay", boxwex=0.1)
In addition, we need to observe whether there are any influential observations:
plot(mlr.dinner,4)
A rule of thumb to determine whether an observation should be classified as influential or not is to look for observation with a Cook’s distance > 1 .We see from the graph that there are no influential observations.
Another thing to consider is linearity, i.e. that the relationship between the dependent and the independent variable can be reasonably approximated in linear terms:
# Linear specification
library(car)
avPlots(mlr.dinner)
In our example it does not seem that linear relationships can be reasonably assumed for all variables.
As we already learned, another important assumption of the linear model is that the error terms have a constant variance (i.e., homoscedasticity):
# Breusch-Pagan Test
library(lmtest)
bptest(mlr.dinner)
The null hypothesis for this test is that the error variances are all equal, and our result is insignificant. Therefore, this assumption is met.
Another assumption to be met is that the error term is normally distributed. One way to check for normal distribution of the data is to employ statistical with the null hypothesis that the data is normally distributed. One of these is a Shapiro–Wilk test:
shapiro.test(resid(mlr.dinner))
When the assumption of normally distributed errors is not met (as it is not met in our case), this might again be due to a misspecification of your model, in which case it might help to transform your data.
Finally, we need to check for multicollinearity, the case when there is a strong linear relationship between the independent variables:
correlation <-cor(dinner, method=c('pearson'))
correlation
By observing our correlation matrix, we can see that non of the coefficients suggest values close to 0.8 or 0.9. Consequently, we conclude that there are no concerns regarding the multicolinearity between independent variables.